The 2016 U.S. Presidential Election brought several unexpected results that reshaped the vision many Americans had for the country. States were closely monitored by analysts to gauge whether they would swing in favor of Hillary R. Clinton (D) or Donald Trump (R). Texas was particularly unique in this election because, despite voting consistently Republican since 1980, many Texan voters did not default to Republican party candidate. Even though Donald Trump won Texas by the end, it occurred on a slimmer margin than the previous 2012 election. By analyzing campaign contributions however, I strive to achieve a better understanding of how donations may have reflected the political atmosphere of Texas at the time of the election.
The dataset was obtained from the Federal Election Commission site. It has 19 variables and almost 54,000 observations. Below are the dimensions and summary of the data.
## [1] 539546 19
## cmte_id cand_id cand_nm
## C00575795:203851 P00003392:203851 Clinton, Hillary Rodham :203851
## C00574624:138805 P60006111:138805 Cruz, Rafael Edward 'Ted':138805
## C00577130: 79954 P60007168: 79954 Sanders, Bernard : 79954
## C00580100: 69288 P80001571: 69288 Trump, Donald J. : 69288
## C00573519: 23692 P60005915: 23692 Carson, Benjamin S. : 23692
## C00458844: 8983 P60006723: 8983 Rubio, Marco : 8983
## (Other) : 14973 (Other) : 14973 (Other) : 14973
## contbr_nm contbr_city contbr_st
## RUDOLPH, BONNIE : 463 HOUSTON : 69946 TX:539546
## SCOTT, MELVIN : 304 AUSTIN : 55523
## SCHORLEMER, DAVID S.: 257 DALLAS : 40143
## PHAN, JULIE : 245 SAN ANTONIO: 28860
## GILLIS, DEBORAH DEE : 218 FORT WORTH : 14134
## WEST, LOIS : 216 PLANO : 8605
## (Other) :537843 (Other) :322335
## contbr_zip contbr_employer contbr_occupation
## 77024 : 773 RETIRED :102975 RETIRED :140646
## 77379 : 517 N/A : 55714 NOT EMPLOYED : 23085
## 75225 : 504 SELF-EMPLOYED: 35768 INFORMATION REQUESTED: 16537
## 78633 : 471 SELF EMPLOYED: 19031 ATTORNEY : 14050
## 780451915: 455 NONE : 18012 HOMEMAKER : 11278
## 75093 : 438 (Other) :307561 (Other) :333813
## (Other) :536388 NA's : 485 NA's : 137
## contb_receipt_amt contb_receipt_dt
## Min. :-16600.0 12-JUL-16: 5816
## 1st Qu.: 20.0 11-JUL-16: 5494
## Median : 38.0 29-FEB-16: 5476
## Mean : 139.1 05-APR-16: 4681
## 3rd Qu.: 100.0 31-MAR-16: 4650
## Max. : 15000.0 02-MAY-16: 4279
## (Other) :509150
## receipt_desc memo_cd
## :522170 :437983
## Refund : 3631 X:101563
## REDESIGNATION TO GENERAL : 2927
## REDESIGNATION FROM PRIMARY : 2922
## REDESIGNATION TO CRUZ FOR SENATE: 1764
## REATTRIBUTION TO SPOUSE : 1311
## (Other) : 4821
## memo_text form_tp
## :410387 SA17A:447296
## * EARMARKED CONTRIBUTION: SEE BELOW: 78429 SA18 : 88619
## * HILLARY VICTORY FUND : 33940 SB28A: 3631
## REDESIGNATION TO GENERAL : 2927
## REDESIGNATION FROM PRIMARY : 2922
## REDESIGNATION TO CRUZ FOR SENATE : 1764
## (Other) : 9177
## file_num tran_id election_tp X
## Min. :1003942 SA17.1135539: 4 : 2252 Mode:logical
## 1st Qu.:1077404 SB28A.1269 : 4 G2016:162833 NA's:539546
## Median :1091720 C10688898 : 3 P2012: 2
## Mean :1091567 C10859784 : 3 P2016:374459
## 3rd Qu.:1112134 C10868587 : 3
## Max. :1134173 C10937433 : 3
## (Other) :539526
The bar graph displays the amount of donations each candidate received. Candidates with donations greater than 1,000 were only shown on the plot. Hillary Clinton received the most donations across Texas with over 200,000 donations. Ted Cruz came second, with Bernie Sanders and Donald Trump following. Ted Cruz was a Texan Senator before the election, which can explain his popularity and number of donations received. I am curious to see the totals for all the candidate’s donations.
## Kasich, John R. Johnson, Gary
## 1187 1336
## Fiorina, Carly Paul, Rand
## 2535 3035
## Bush, Jeb Rubio, Marco
## 3578 8983
## Carson, Benjamin S. Trump, Donald J.
## 23692 69288
## Sanders, Bernard Cruz, Rafael Edward 'Ted'
## 79954 138805
## Clinton, Hillary Rodham
## 203851
The top cities were donations were most made included: Houston, Austin, Dallas, San Antonio, Fort Worth, and Plano. Houston is the most populated city in Texas, with Austin in second, which could explain the high number of donations in these cities. I am interested in finding how each city leans politically.
## IRVING AMARILLO THE WOODLANDS GEORGETOWN MIDLAND
## 3562 3682 3772 3775 3879
## FRISCO CORPUS CHRISTI SUGAR LAND LUBBOCK KATY
## 4085 4504 4990 5163 5926
## ARLINGTON EL PASO SPRING PLANO FORT WORTH
## 7101 7564 8017 8605 14134
## SAN ANTONIO DALLAS AUSTIN HOUSTON (Other)
## 28860 40143 55523 69946 117729
After creating a new variable “Donation_Level”, I grouped the donation amounts into 5 levels: “$200 and Under”, “$200.1-499”, “$500-999”, “$1000-1999”, and “$2000 and over”. There were overwhelmingly more donations under $200, as demonstrated by the first plot. Then I was interested to demonstrate which types of donations candidates received most.
In summary, Hillary Clinton received over 175,000 donations of $200 or less. For donations between $500-999, Donald Trump received over 10,500 donations. As the donation value increased, both Hillary Clinton and Ted Cruz continued to compete for the most donations.
## $200 and Under $200.1 -499 $500-999 $1000-1999 $2000 and over
## 471235 33724 13193 9287 12107
## TRUE
## 0
Donations Under $200: It is apparent that the top 5 candidates receiving donations valued $200 or less are Hillary Clinton (D), Ted Cruz (R), Bernie Sanders (D), Donald Trump(R), and Ben Carson(R).
Donations between $200.1-499: For donations valued higher, Donald Trump (R) received the most with Hillary Clinton (D), Ted Cruz (R), Ben Carson (R), and Bernie Sanders (D) following. It is interesting to note that even though Texas is overall considered a “red” or Republican state, Hillary Clinton is still receiving similar if not more donations than her political counterparts.
Donations between $500-999: With higher donation amounts, Donald Trump (R), Ted Cruz (R), and Hillary Clinton (D) still have the highest donation counts. Other candidates such as Bernie Sanders (D) and Ben Carson (R) received drastically less donations.
Donations between $1000-1999: It is important to notice that donations to Donald Trump(R) have decreased, with Hillary Clinton(D) and Ted Cruz(R) receiving the most donations in this category.
Donations over $2000: Ted Cruz(R) and Hillary Clinton(D) remain close in donations in this category as well, with donations to Donald Trump (R) decreasing.
After grouping the data based off of individual contributor information, the plot displays how many donations each donor contributed. After creating the first plot, I adjusted the y axis by log10, and the x axis with limits. There is one noticeable outlier that is over 60,000 donations.
A contributor can be influenced politically by their lifestyles and occupations. A person’s occupation influences how much they can donate. Here is a break down of which occupations donated the most and the least. Stats of the top 10 occupations are listed as well. It is also apparent that those who are retired contributed in the highest of numbers.
## RETIRED
## 140646
## NOT EMPLOYED
## 23085
## INFORMATION REQUESTED
## 16537
## ATTORNEY
## 14050
## HOMEMAKER
## 11278
## ENGINEER
## 8933
## PHYSICIAN
## 8707
## TEACHER
## 7968
## SALES
## 6831
## INFORMATION REQUESTED PER BEST EFFORTS
## 6263
The dataset, TX, contains 19 variables with nearly 54,000 observations.
I am very interested in examining if living in a particular city influences who a contributor donates to. I also want to further explore the backgrounds of the contributors (such as amount donated, their occupation, and the number of donations), which will shed more light on the voters and donors in Texas.
Yes, as mentioned above, I created the donation_level variable to group donation values in certain bins.
I applied log10 to the y axis of the last plot, which examined the number of donations each contributor made. The original graph was extremely skewed to the right and I wanted to have more insight on the plotted data. I also adjusted the x-axis using coord_cartesian().
After exploring candidate donations by certain buckets, I revised the bar graph to better visualized the amount of donations recevied for the top 4 candidates.
I also included box plot of the candidates that show summaries of their contribution amounts. Applied log10 to the y axis to easily compare the ranges of donation amounts. Some findings include Donald Trump having the highest donation value median. This was expected since Donald Trump was receiving higher valued donations in greater numbers than lower valued ones.
The plots below explore means donations and number of donations, after accounting for the different candidates. Correlation for the two variables was -0.383, which can indicate a small correlation. As number of donations increased, mean_donation decreased.
##
## Pearson's product-moment correlation
##
## data: tx_candidates$number_of_donations and tx_candidates$mean_donation
## t = -1.9938, df = 23, p-value = 0.05816
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.67641730 0.01325091
## sample estimates:
## cor
## -0.38389
Instead of accounting for candidates as I did in the previous plot, I wanted explore if the same negative trend occurred when I accounted for the different cities. In this plot, the calculated correlation was much smaller at -0.0088.
##
## Pearson's product-moment correlation
##
## data: tx_cities$number_of_donations and tx_cities$mean_donation
## t = -0.41871, df = 2220, p-value = 0.6755
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.05045083 0.03270921
## sample estimates:
## cor
## -0.008886173
It was determined earlier that Hillary Clinton received majority of the donations, with Donald Trump coming fourth. Since these two candidates were opponets in the Presidential election, I wanted to further explore the breakdown by Texan cities.
Mean donations for Hillary Clinton and Donald Trump:
## [1] 91.74212
## [1] 31.18272
I was eager to visualize the relationship between total_employees and contributions_total. This is becuase different occupations have varying salaries and impact how much an individual contributes, for example teachers would have less to contribute than a CEO. To my surprise, the plot displayed that as total_employees increased, so did the contribution_total (with a correlation of 0.9295151).
##
## Pearson's product-moment correlation
##
## data: tx_occupations$total_employees and tx_occupations$contributions_total
## t = 330.18, df = 17161, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.9274517 0.9315219
## sample estimates:
## cor
## 0.9295151
I wanted to measure how number of donations & mean donations were influenced when accounting for a particular candidate or a contributors city. In the candidates, there was a small but meaningful relationship between the two variables. As number of donations increased, mean donations decreased for candidates. A possible explanation for this is that low-value donations are occurring at higher numbers versus one high-valued donation. This was also examined earlier in the analysis, when it was determined majority of donations were $200 or less. When measuring the same variables but accounting for different cities, another pattern was produced. The correlation was closer to 0, and majority of the mean donation values were 250 or less, despite number of donations made.
Another set of variables I examined were cities support for particular candidates. It was determined earlier that Houston, Austin, and Dallas had the highest number of donations, but I wanted to explore in favor of which candidate. To my surprise, for each of those cities, contributor’s donated substantially more to Hillary Cliniton than Donald Trump - despite Donald Trump winning the popular vote in Texas. Lastly, I create a scatter plot for number of employees per occupation vs. the total contributions made. Log10 x & y scales were applied to better visualize the data, and the plot showed how as total employees increased, so did the contribution amount. There was a 0.92 pearson correlation for the trend.
I took the last bivariate plot and applied the number of donations variable to examine if it could explain the high correlation between contribution totals and total employees. I did not apply the alpha parameter, to better visualize the color ranges on the plot. It is apparent that number of donations did not have much impact on the variables since there is such little variation of colors.
For each city, I want to see what the mean_donation & it’s relationship with the number of donations. Houston had the most donations but a lower mean than Dallas.
For the first graph, the variables investigated were total_employees, contribution_total, and number_of_donations. In the last section, total_employees and contribution_total were determined to have a strong relationship, and I wanted to see if the number_of_donations had any influence. Upon further examination, there was very little variation in color (relation to number_of_donations) which meant there was little influence by the variable.
The second graph analyzed number_of_donations, contbr_city, and mean_donation. I was surprised to see much variation between number_of_donations and mean_donations. For example, Houston had the most donations, but had a higher mean than Austin (which came second in number od donations). I would be eager to compare this graph with mean salaries for individual cities in Texas.
This bar graph examines how much each candidate received of the different donation values. The donation amount most given by contributors were $200 and Under, and Hillary Clinton and Ted Cruz were the top 2 candidates receiving these donation types. Ted Cruz and Hillary Clinton continue to be popular candidates to receive donations, however Donald Trump recieved the most donations valued $200.01-499 than the other candidates - even though they were substantially less occurring. In all four bars, Bernie Sanders does not receive the most donations, however Bernie Sanders received more donations than Donald Trump when totaling donations.
The plot examines the relationship between number of employees and number of donations made in a given occupation. As the number of employees in a profession increased, the total donation amount also increased. It’s important to note that all occupations were included (of varying salaries) when making the plot, which could mean that salaries did not influence donation amount. More data regarding contributor’s salary would be required to formalize this finding.
The final variable that was explored further were Texan cities. Total donations and mean donations were examined for each city. While understanding each city has unique population counts and cultures, the results showed that cities contributed different number of donations and had varying means. Midland, for example, had under 5000 donations, but mean donation amount was around $300. Houston, on the other hand, had over 60,000 donations but it’s mean donation was around $200.
When conducting my analysis, I initially ran into some difficulties dealing with mostly qualitative data. The cities, candidates, contributor names, and contributor occupations are some of the variables I had extensive data on. I found a lot of success when I created the new “donation_level” variable to group the various donation amounts. It was easier to conduct analysis across the other variables with a cleaned variable. That data could be enriched with additional information about the contributors, such as salary amount and usual poltical alliance. I would not be surprised if usual Republican voters, voted for Hillary Clinton for this particular election.